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Abstract 

T\vo new indices to detect answer copying on a multiple-choice test — S\ and S 2 — 
were proposed. The Si index is similar to the /C-index (Holland, 1996) and the K 2 - index 
(Sotaridona & Meijer, in press) but the distribution of the number of matching incorrect 
answers of the source (examinee s) and the copier (examinee c) is modeled by the Poisson 
distribution instead of the binomial distribution to improve the detection rate of K and 
K 2 - The S 2 index was proposed to overcome a limitation of the K and K 2 index, namely, 
their insensitiveness to correct answers copying. The S 2 index incorporates the matching 
correct answers in addition to the matching incorrect answers. A simulation study was 
conducted to investigate the usefulness of S\ and S 2 for 40- and 80-item tests, 100 and 
500 sample sizes, and 10%, 20%, 30% and 40% answer copying. The Type I errors and 
detection rates of S\ and S 2 were compared with those of the K 2 and the u> copying index 
(Wollack, 1997). Results showed that all four indices were able to maintain their Type I 
errors, with S\ and K 2 being slightly conservative compared to S 2 and u>. Furthermore, 
Si had higher detection rates than K 2 - The S 2 index showed a significant improvement 
in detection rate compared to K and K 2 - 

Key Words: nominal response model, copying indices, cheating, Poisson distribution, 
loglinear model 
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Cheating on tests has a long and rich tradition (Cizek, 1999, Chap. 5). Among 
the cheating methods used are using forbidden materials, circumventing the testing 
process, or even using microrecorders. In the present study, we will be concerned with 
answer copying. In this type of cheating, one examinee copies the answers from another 
examinee, which may take place using all kinds of codes for transmitting answers and a 
code for doing so, for example, clicking of pens, tapping of the foot, and the like. Thus the 
examinees do not have to be in the physical neighborhood of each other. Because answer 
copying may invalidate an examinee’s test score, it is necessary to prevent those practices 
by using well-instructed proctors and construct the seating arrangements so that there is 
ample room between the examinees. However, if a proctor observes some irregularities, 
statistical methods may be used to obtain additional evidence of answer copying. 

Several methods have been proposed that all are based on determining the chance or 
likelihood that the observed score patterns of two examinees under suspicion are similar. 
A high likelihood may indicate answer copying. These chance methods can be classified 
into two types (Cizek, 1999, pp. 138-139). One type of method compares an observed 
pattern of responses to a known theoretical distribution (e.g., Frary, Tideman, & Watts, 
1977; Wollack, 1997). In the second type of method, the probability of an observed pattern 
is compared with a distribution of values derived from independent pairs of examinees 
who took the same test. An example of such a statistic is the K-index (Holland, 1996). 

Sotaridona and Meijer (in press) investigated the statistical properties of different 
forms of the K-index and compared the detection rate of these indices with the detection 
rate of the u> index (Wollack, 1997). The major difference between the indices is that the 
Tf-index does not assume any test model, whereas u> is based on item response theory 
modeling (e.g., van der Linden & Hambleton, 1997). Sotaridona and Meijer (in press) 
discussed that the Tf-index is less sensitive to answer copying when both the source and 
the copier have many matching correct answers. Lewis and Thayer (1998) and Sotaridona 
and Meijer (in press) found that the AT-index that used the binomial distribution to model 
the matching incorrect answers had low power to detect substanstial amount of copying. 
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In this paper, we will propose an index S 2 that both takes the matching correct 
and the matching incorrect answers into account. Furthermore, we discuss an index Si 
which mathematical form is similar to the /C-index (Holland, 1996) and the K 2 - index 
(Sotaridona & Meijer, in press) but the distribution of the number of matching incorrect 
answers of the source and the copier is modeled by the Poisson distribution. 

This study is organized as follows. First, we introduce the //-index and the to index. 
Second, we discuss two new indices Si and S 2 that may be used to obtain additional 
evidence of answer copying. Third, we conducted a simulation study to investigate the 
Type I error rate and detection rates of Si and S 2 . 

Existing Copying Indices 

In this study, the copying indices to (Wollack, 1997) and K 2 (Sotaridona & Meijer, 
in press) are compared to the newly proposed copying indices. Si and S 2 , with respect 
to the Type I errors and detection rates. A brief description of to and K 2 is given below 
followed by a more elaborate discussion of Si and S 2 . The reader is referred to Sotaridona 
and Meijer (in press) and Wollack (1997) for a more detailed treatment of K 2 and to 
respectively. 

The to Index 

Let examinee c, the copier, be suspected of copying answers from examinee s, the 
source. In a multiple-choice test with options v = 1, 2, . . . , k, . . . , V, let ha the number 
of items i= 1,2,...,/ where the response of c matches the response of s. Given that the 
response of s on i is k, let Pik(d c ) denotes the probability of c selecting the same option 
k on item i. Wollack (1997) used the nominal response model (Bock, 1972) to obtain this 
probability which is given by 



ex P(Cifc + Kkdc) 



£ exp(Ci„ + A iv e c ) 

v=l 
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where £ ifc and \k are the intercept and slope parameters. The expected value of h cs is 
computed, conditional on the ability level of the copier ( 6 C ), the item response vector 
of the source U s = (U \ s , . . . , Uj a ) where Ui S is the response to item i, and the item 
parameters £ = (£i,...,£j) with ^ = (Cu. • • • , Civ> A ii> • • • > *iv). as 



/ 

Eih" \6 c ,U s ,Z) = J2 Pik ( e J' ( 2 ) 
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and the standard deviation of ha is 
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The u index is based on the residual between the observed and the expected value 
of ha- A standardized residual defines u>, which is asymptotically standard normally 
distributed (Wollack, 1997). The larger the value of u>, the stronger the evidence that c 
copied from s. The w-statistic is given by 



^ J^CS Ejhg |fl C ,U a ,£) 

a hcs 

The K’l Index 

Define the number incorrect group r = 1, 2, . . . , d, . . . , R such that examinees 
j = 1,2 J r have the same number of wrong answers, and d indicate the group 
membership of c. The number of examinees in number incorrect group r is denoted by 
J r so that J c ' is the number of examinees with the same number of wrong answers as 
examinee c. Consequently, the two-letter index rj will be used to indicate an examinee 
j in number incorrect group r. Let Ui r j be the response of examinee rj to item i and let 
W s be the set of items, of size w s , answered incorrectly by s. 
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For each examinee rj, an indicator variable Ai r j equal to 1 if Ui T j = U is , and 0 
otherwise. Note the item response of s is index by is indicating that s does not belong 
to any number incorrect group. The number of matching incorrect answers of rj and s, 
denoted by M rj is then defined as 



For a particular s — c pair, M r j is observed for each examinee rj. For simplicity, M r j will 
be denoted by M when it is not necessary to identify the examinee. 

The K 2 index is similar to the K index discussed by Holland (1996). For example, 
both indices are based on the random variable M, and are computed similarly as the sum 
of the binomial probabilities 



where M c / c , with realization m c > c , is the number of matching wrong answers between 
c and s, and p is the success probability parameter in the binomial distribution. The 
rationale behind the choice of the binomial distribution for M is discussed in Holland 
(1996). The main difference between K and K 2 is the way p is estimated. For the K 



of wrong answers of the source (Holland, 1996). For the K 2 index, p is estimated by 
j»2 = E{Po + Pi Qr + PzQl + £ r), where Q r is the proportion of wrong answers of 
examinees in number incorrect group r. The parameters P 0 , P l , and P 2 are regression 
coefficients, and e T is an error term which is assumed to have a normal distribution with 
mean 0 and variance a 2 . 

Note that the value of p is obtained using data in the number incorrect group d only, 
whereas the value of p 2 is obtained using all relevant information from R number incorrect 




( 5 ) 




( 6 ) 



index, p is estimated by jo = where M c / c is the mean of M c / c and w s is the number 
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groups. In this sense, p 2 contains more information than p, and therefore gives better 
estimate of p than p. 

The K 2 index is defined as 



W a 

K 2 = Pr (M dc > m dc ) = ^2 

w~m c f 

The K 2 index is an upper-tail probability. This probability can be compared to a chosen 
nominal level of significance a, such as 0.01. When it is less than or equal to this value, 
c may be identified as having a pattern of responses unusually similar to that of s. 

Sotaridona and Meijer (in press) showed that the detection rates of the K 2 index were 
in general higher than those of the K index, while u yielded the highest detection rates. 
Furthermore, both K 2 and u were able to keep their empirical Type I errors below the 
nominal levels. The negative consequence of falsely identifying a noncopier as copier is 
severe, so we prefer an index that has Type I error at the nominal level or slightly below 
the nominal level. 
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Two New Indices 



The 5i Index 

The Si index is similar to the K 2 index in that it is also based on the random variable 
M. The S\ index differs from the K 2 index in the following ways. First for the K 2 
index, M is assumed to follow a binomial distribution whereas for S\, M is assumed 
to follow a Poisson distribution. Secondly, the Poisson parameter p is estimated using a 
loglinear model, whereas in the K 2 index the binomial parameter p was estimated using 
a linear regression model. The motivation for proposing the Poisson distribution for M 
and the loglinear model for estimating p as well as a statistic for checking the adequacy 
of the loglinear model are discussed, respectively, in the next three subsections. Once the 
estimate of p for number incorrect group c', p c ,, is obtained, the S\ index is computed as 
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3 —ji , 

S\ = Pr(M c / c > m c > c ) = V - 

t— 1 w\ 
ui=m c i c 

Equation (8) is the probability of w 3 being greater than r?v c . The smaller the value 
of Si, the stronger the evidence of answer copying. 

The Choice for the Distribution of M 

Several distributions have been assumed for the random variable M by previous 
researchers dealing with copying indices. Bay (1995) used the compound binomial 
distribution in developing the B m copying index when all items in the item score pattern 
are considered. For the case where only the incorrect answers are considered, the ESA 
copying index (Belleza & Belleza, 1989), the K index (Holland, 1996), and the index 
(Sotaridona & Meijer, in press) used the binomial distribution for M. Wollack (1997, 
p. 309) criticized the B m and ESA indices for their inability to adjust the probabilities 
associated with an examinee’s responses as a function of test score. Wollack (1997) found 
that B m and ESA had lower detection rates compared to other indices based on classical 
test theory like the index (Frary et al., 1977). We did not include the g-i index in this 
study since Wollack (1997) found that the Type I errors of 0 2 are grossly inflated. Unlike 
02 , the K 2 index is able to control its Type I error below its nominal level. 

Recall that the responses of the source to a set of test items are considered fixed and 
given these responses we count the number of wrong responses of the copier that matches 
that of the source and call it M. Since the binomial distribution did not yield high detection 
rates for K and (Lewis & Thayer, 1998; Sotaridona & Meijer, in press), we propose 
the S\ index that assumes a Poisson distribution as a reasonable approximation to the 
distribution of M. Hence, one may conceptualize S\ as monitoring the rate or number of 
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answer matches per incorrect answer by the source. If this rate is sufficiently high, then 
this provides evidence of answer copying. The extent to which the Poisson distribution 
approximates the distribution of M was investigated empirically. 

Model for Estimating /j, 

To compute S\ in Equation (8), we should know w s , m c > c , and /x. The value of w s 
and mdc are known whereas that of [i must be estimated. The mean of M differs across 
different ability levels. The values of M are small if most of the examinees have high 
ability level because the number of incorrectly answered items by the copier to match 
the wrongly answered items by the source is small. On the other hand, if most of the 
examinees have low ability level, the number of incorrectly answered items is large and 
the number of matched items is likely to be large. This information is taken into account 
in estimating fi by stratifying the examinees according to the number of wrong answers 
they obtained. 

Since the Poisson distribution was assumed for M, it is standard practice to use the 
loglinear model to model the log of the mean of M (Agresti, 1996, p. 73). Using this 
model, it allows n to be nonlinearly related to the predictor variable which in this case 
is the number of wrong answers. A study by Hanson (1994) revealed that the loglinear 
model is satisfactory for modeling M with M assumed a compound binomial distribution. 

The relevant data for estimating /x are the number of wrong answers and the mean 
number of matching incorrect scores for each number incorrect group r. Let /x r denote 
the expected value of the Poisson variate M rj . The loglinear model has the form 

log(A*r) =Po + Pi w r> V r (9) 

where (3 0 is the intercept term signifying the logarithm of the population mean across 
R number incorrect groups, and is the slope parameter. Estimation of (3 0 and (3 l in 
Equation (9) is discussed in Agresti (1996, p. 93). 

To obtain S\, we need to determine the fitted mean for the number incorrect group 
to which the copier belongs. This fitted mean is 
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Model Checking 

The fit of the loglinear model in Equation (9) was investigated using the likelihood- 
ratio goodness-of-fit statistic, G 2 , (Agresti, 1996, p. 89). The G 2 statistic can be used to 
test the null hypothesis that the model fits the data against the alternative that the model 
does not fit the data. Let /2 r be the fitted mean number of matching incorrect answers of 
number incorrect group r. The G 2 statistic is given by 



If the model perfectly fits the data, /i r = /j, r . In such a case, log = 0 and 
consequently G 2 = 0. The distribution of G 2 is approximately chi-squared with degrees 
of freedom equal to R minus the number of model parameters. For the loglinear model in 
Equation (9), the number of model parameters is 2. The p-value to test the null hypothesis 
is the right-tail probability. Large values of G 2 or small p-values, for example, less than 
.01, would suggest a poor model fit (Agresti 1996, p. 89). If the fit of the model to the 
data is poor then it would not be appropriate to use Equation (8) as a statistical test of 
answer copying. 

The S2 Index 

Copying indices that are based solely on the matching incorrect answers, such as the 
K and K 2 indices, discard the additional information about copying that are available in 
the matching correct answers. By excluding the number of matching correct answers in 
the analysis of answer copying, we explicitly assume that c completely knows the answer 




( 10 ) 
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to item i whenever c and s give a correct response to item i. However, this is not always the 
case. An examinee may obtain the correct answer to an item by copying or by guessing. 

Note that the K and K 2 indices are not sensitive to a copier who is copying only 
the correct answers of the source. This may be the case when s and c are friends and 
s shares his or her answers to c on items where he or she is almost sure of the correct 
answers. Another example is a high-stakes examination where c may bribe s for sharing 
his correctly answered items to c. 

The new copying index S 2 is proposed to overcome this limitation. We propose S 2 to 
incorporate information about copying that are contained in the matching correct answers 
in addition to the information in the matching incorrect answers. Note that as used in K 
and K 2 , the evidence of answer copying is 1 if s and c choose the same wrong option 
to an item, and 0 if they are both correct or their response to an item did not match. For 
S 2 , however, the evidence is 1 if s and c choose the same wrong option to an item, 6 (to 
be described below) if the source and the copier are both correct, and 0 otherwise. The 
variable 6 quantifies the amount of correct-answer copying information to an item for a 
particular source-copier pair. 

Let i* denotes an item that was answered correctly by s, and Ui» r j the response of 
examinee rj to item i*. Then, gives the estimate of copying information on item i* 
by examinee rj. The value of 6i- r j satisfies the inequality 

1 ^ &i*rj — 0 , 



that is, 6i-rj = 0 if rj knows the correct answer to item i* and 5i- T] = 1 if rj is completely 
ignorant about the correct answer to item i* (see conditions 1-2 below). The problem is 
to quantify the amount of knowledge that rj has on i*. To do this we have to obtain 
the probability of rj answering item i* correctly. This probability can be estimated as 
the proportion of examinees in number incorrect group r getting the correct answer to 
item i*. A drawback of this approach is that the estimate is highly dependent on the 
population of examinees taking the test. For example, the estimate would tend to be low 
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if the population of examinees are of high ability level while the estimate would tend to 
be high if the population of examinees are of low ability level. A sensible solution is 
to condition on the ability level of the suspected copier. For the rest of the presentation, 
unless specified otherwise, j will refer to an examinee that belongs to certain number 
incorrect group. 

Let Pi- r j denotes the probability of rj getting the correct answer to item i*, and 
Ai- r j an indicator variable equal to 1 if Ui- T j = Ui- S , and 0 otherwise. Note that Pj. rj is a 
conditional probability, not to be mistaken as a joint probability that s and rj will give a 
common response to item i*. Given Ui- a , this probability is 



P\ m rj — Pr (Ui m rj Hi- s | Lfi- a ) , 



(ID 



and the maximum likelihood estimate of Pi- T j is 



P i-rj 



F, Aj* r j 
J=1 

J T 



( 12 ) 



Given the estimate of Pi- T j, what remains is to transform this estimate into 6i- r j. A suitable 

transformation function, f (Pi-rj), satisfy the following conditions: 

1- f (Pi-rj) approaches 0 as Pi- r j approaches 1; that is, the evidence of answer copying 
deminishes as Pi- r j approaches 1. 

2. f (Pi-rj) approaches 1 as Pi- T j approaches 0; that is, the evidence of answer copying 
approaches 1 if the suspected copier is correct to an item despite low probability of 
getting the correct answer to such an item. 

3. Test with different number of options must have different weight function. Let / and 
/' be two different weight functions and i* and i*' are items taken from two tests with 
number of options V and V' such that V < V' . Then it holds that f(Pi- T j) > f(Pi-'rj) 
whenever P,- T j = Pi- r j- 

The basis for conditions 1-2 should be clear from the above discussions. Condition 
3 arises from the idea that multiple-choice tests with different number of options should 
have different transformation functions that differ by a factor that is a function of the 
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number of options. This calls for a function that account for the probability of guessing 
to an item as a scaling factor. 

For notational convenience, let g denotes the probability of getting the correct answer 
to item i by guessing. Note, an often used value of g is 0.20 for a 5-option test and 0.25 
for adoption test. A sensible function satisfying conditions 1-3 is shown in Equation 13. 



Equation (13) is a monotone decreasing function of Pi- r j with g a scaling constant. 
Figure 1 shows the graph of Equation (13) with g = .2 (denoted as FI— 5 options) and 
g = .25 (denoted as F2— 4 options). As shown in the graph, the value of 6i- r j for both F\ 
and F 2 approaches zero as Pi- r j approaches 1 and 6i- r j approaches 1 as Pi- r j approaches 
zero (conditions 1-2). Furthermore, Fl(F- r j) < F2 (Pi* r j) for P,. rj € (0, 1] (condition 



Let M r * denotes the sum of the number of matching incorrect answers and weighted 
matching correct answers by examinee rj and examinee s. The expression for M is 
given by 



6i'rj = f (Pi-rj ) = ( die) d2Pi ' r i 



(13) 



where 




[Insert Figure 1 about here] 




(14) 



i- 



In Equation (14), the contribution of each item to the value of M*j is 0 if the response of 
rj did not match that of s, 1 if the wrong response of rj matches that of s, and 6i- rj if the 
correct response of rj matches that of s. The value of Mjj would be large if most of the 
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incorrect responses of rj matches the wrong responses of s or if Pi- T j is small and most 
of the correct responses of rj matches the correct responses of s. The larger the value of 
Mj- relative to the number of items, the stronger the evidence of answer copying. 

Note that if there are no matching correct answers between s and rj, the second term 
in Equation (14) sum up to zero and Mj- = M r j. Hence, M T j becomes a special case 
of Mjj . On the other hand, if there are no matching incorrect items but only matching 
correct answers, then M rj — 0 and Mj- — 6i- T j. Thus, while M rj is only sensitive to 
incorrect answer copying, M*j is sensitive to both correct and incorrect answer copying. 

In reality, the random variable Mj- is a nonnegative real-valued random variable. 
We treat Mjj as an integer by rounding it off to the nearest integer. Although some error 
is introduced by doing this, we expect that this will only have a minor influence on the 
effectiveness of the statistic. Like M T j, we used the Poisson distribution for Mj- and 
the loglinear model to estimate its mean. We explored empirically the usefulness of the 
Poisson distribution to model Mj- using the G 2 statistics. 

The S 2 index is then defined as 



E 

w=m* , 
c'c 



l 

w\ ’ 



(15) 



S2 = Pr(M*, c > ra*, c ) = £ 






J K 



w\ 



where M*, c , with realization m*, c , is the sum of the number of matching incorrect and 
weighted matching correct answers between c and s. The smaller the value of S 2 , the 
more likely that answer copying occurred. 
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Method 

Data Generation and Simulation of Copying 

The data were simulated in the same way as in Sotaridona and Meijer (in press). 
Multiple choice test items with five options were considered with test lengths 40 and 
80 items and samples of 100 and 500 simulees were generated. Item parameters were 
chosen in accordance with the study by Wollack (1997). As described in Wollack (1997), 
the item parameters were estimated under the nominal response model using MUL3TLOG 
(Thissen, 1991) for an 80-item, 5-altemative English college placement test and a 40-item, 
5-altemative mathematics college placement test used at a large Midwestern research 
university. We draw the ability parameter, 6, from N(0, 1). Given the item and ability 
parameters, Pi V (6 ) was computed based on the nominal response model. The item 
response was drawn randomly from v = [1,2, , V], each having probability of being 
drawn equal to Pn(0), P i2 (0), . . . , Piv{0) respectively. Then the source was drawn at 
random from a sample of simulees having ability percentile rank ranging from 40 to 90. In 
both 40- and 80-item tests, five percent copiers were selected randomly from the simulees 
with 0 level below the 6 level of the source. The percentage of items copied were 10, 20, 
30, and 40. 

We crossed the three factors - sample size (2 levels), number of items (2 levels), and 
percentage of items copied (4 levels)-resulting in 2 x 2 x 4 = 16 testing conditions. The 
dataset in each condition was replicated 100 times. 

Similar to Wollack (1997), copying was simulated by first randomly selecting a 
specified percentage of items from the copier and then altering the responses of c to match 
the responses of s on those items. 

Type I Error and Detection Rates 

A simulee was identified as a copier by K%, S\, or S 2 index if the values were less 
than or equal to the level of significance a. The a levels were set to .0001, .0005, .001, 
.005, and .01; similar a levels used in Wollack (1997) with the exclusion of .05, .10 and 
.0025. 
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For the u> statistic, a simulee was identified as a copier when the value of to was 
above the one-tailed critical value corresponding to the right tail of the standard normal 
curve. The u> was computed using the item and ability parameters that were used in the 
simulation. This was done partly for convenience and partly because Wollack & Cohen 
(1998) showed that the Type I error rate of u> is unaffected by estimating the item and 
ability parameters. As in Sotaridona and Meijer (in press), the copying indices were 
computed based on prior suspicion of a particular simulee copying from a specific source. 
Hence, the statistics were tested for significance without adjustment for the a level. 

To determine the empirical Type I error rate, we computed the proportion of 
noncopier simulees who were identified by the copying index as copiers. This 
computation was based on 9400 non copiers (94 noncopiers per replication x 100 
replications), for datasets with 100 examinees, and 47400 non copiers (474 noncopiers 
per replication x 100 replications), for datasets with 500 examinees. 

Likewise, the detection rate was obtained by taking the proportion of true copier 
simulees who where classified as copier by K 2 , Si, S 2 , and ui. This computation 
was based on 500 true copiers (5 true copiers per replication x 100 replications), for 
datasets with 100 examinees, and 2500 true copiers (25 true copiers per replication x 100 
replications), for datasets with 500 examinees. Ideally, we want an index which minimizes 
the Type I error rate and maximizes the detection rate. 

Results 



Adequacy of the Loglinear Model 

The fit of the loglinear model given in Equation (9) was assessed using the G 2 
statistic. The results are similar for M and M* and also between 40— and 80— item test 
so only the results for M* with 40— item test are presented and discussed here. 

Figure 2 shows the scatter plots of 100 p- values (x-axis) by rank (y-axis) for 40-item 
test with 100 and 500 simulees and for different percentages of items copied. Remember 



T\vo New Statistics to Detect Answer Copying - 17 



that the null hypothesis being tested is that the loglinear model fits the data; large p-values 
therefore supports the null hypothesis. 

[Insert Figure 2 about here] 

The loglinear model fits the data very well in every situation simulated as reflected by 
the high p-values both for J = 100 and J = 500. For example, at J = 100, the minimum 
p-value for 10% copying is 0.332, 0.418 for 20% copying, 0.182 for 30% copying, and 
0.481 for 40% copying (Figure 2 a-d), whereas at J = 500, all the p-values are nearly 1 
across four percentages of copying (Figure 2 e-h). The fit of the model are quite similar 
for different percentages of copying. 

Type I Error Rate 

Figure 3 shows the Empirical Type I error of u>, K 2 (denoted as K 2), 5i, and 5 2 
(denoted as 51 and 52) for different ct-levels and across combinations of sample sizes and 
test lengths. The solid line in the graph is a boundary line indicating perfect agreement 
between the nominal and empirical Type I errors. A copying index having Type I errors 
that is above the boundary line is liberal in classifying the simulee as copier and below 
the boundary is conservative. An ideal copying index maintains its Type I error on or 
slightly below the nominal a level, but not too far below; otherwise, its detection rate 
will be reduced. 

[Insert Figure 3 about here] 

The 5 2 index holds its Type I error for J = 100 (Figures 3a-b) and tend to be slightly 
liberal for J = 500 (Figure 3 c-d). The uj index on the other hand is slightly liberal for 
J = 100 and slightly conservative for J = 500. Both the S\ and K 2 were able to hold 
their Type I errors below the nominal levels for which, in most cases, lower than the Type 
I errors of S 2 and ui. The most conservative index for J = 100 was S\ and for J = 500 
was K 2 . 

Detection Rate 

The detection rates for /f 2 , Si. 5 2 , (denoted K2, Si and 52, respectively), and u> for 
different a-levels, percentages of copying, and test lengths are shown in Figure 4 for 100 
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simulees and Figure 5 for 500 simulees. The detection rates for all the indices increased 
with percentage of copying. For example for 40 items and 100 simulees, the detection 
rates in Figure 4a (40% copying) are higher than the detection rates in Figure 4b (30% 
copying) which are both higher than that in Figure 4c (20% copying) and Figure 4d (10% 
copying). Similar trends were observed for other combinations of sample sizes and test 
lengths. 



[Insert Figures 4 and 5 about here] 

Consistent with the findings of Wollack (1997) and Sotaridona and Meijer (in press), 
the detection rate of u> increased with test length but not with sample size. For example, 
Figure 4 shows that for 100 examinees, the detection rate of u> was higher for the 80-item 
test than for the 40-item test. The same observation was noted for 500 examinees (Figure 
5). For a fixed test length, changing the sample size from 100 to 500 did not change the 
detection rate of u) (compare Figure 4 with Figure 5). 

On the other hand, the test length, the sample size, or a combination of both test length 
and sample size affect the detection rates of K 2 , Si and S 2 . In particular, increasing the 
test length (compare Figure 4 with Figure 5) or sample size (compare Figure 4 a-d with 
Figure 4 e-h, and Figure 5 a-d with Figure 5 e-h) resulted in increased detection rates. 
What follows, we compare the detection rates for the four indices.We should keep in mind, 
however, that the empirical Type I errors are not exactly similar, though the differences 
are small. 

None of the index is best in all testing conditions considered. The S 2 index 
outperformed the other indices if the amount of copying are 20% and 10% regardless 
of test length and sample size (Figure 4 c-d, g-h and Figure 5 c-d, g-h), or if the amount 
of copying is 30% or 40% with 40 items and 500 simulees (see Figure 5 a-b). 

Furthermore, with 30% or 40% copying, the S 2 index approximately equalled the 
u> index on 40 items and 100 simulees (Figure 4 a-b) and on 80 items and 500 simulees 
(Figure 5 e-f), whereas the us index has the highest detection rates on 80 items and 100 
simulees (Figure 4 e-f). The four indices are equally effective with nearly 100% detection 
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rates (for a = .01, 40% copying, and I = 80; see Figures 4e and 5e). In general, the K 2 
index had the lowest detection rate compared to the other indices. 

Discussion 

We proposed the Si index to detect answer copying as an alternative to the K 2 index. 
In the Si index the Poisson distribution was used instead of the binomial distribution for 
M. The S 2 index was also proposed to overcome the limitation of K 2 and subsequently 
that of Si that are not sensitive to answer copying the correct item scores. Crucial in 
the application of Si and S 2 is obtaining reliable estimates of the means of M and M* . 
We approached this concern by using the loglinear model. We evaluated the fit of the 
loglinear model using the G 2 statistic. The Type I errors and detection rates of Si and S 2 
were compared with the Type I errors and detection rates of K 2 and ui. 

The results did not provide convincing evidence against using the Poisson 
distribution for M and M*. In particular, using the Poisson distribution, instead of the 
binomial distribution, resulted in Si having detection rates considerably higher than that 
of K 2 . The S 2 index, which incorporates information from the matching correct scores in 
addition to the matching incorrect-scores, lead to a significant improvement in detection 
rate of Si. In general, a copying index is not sensitive when only few item scores are 
copied. This initial study reveals that S 2 showed noticeable improvement over the best 
copying index u if the amount of copying is 20% of the total number of items or less. 

As shown in this study and in Sotaridona and Meijer (in press), if the item parameters 
in the nominal response model can be estimated reliably, u> seems to be the best choice for 
detecting answer copying because it is sensitive across all ability levels of the copier and 
can also be used to detect answer copying on examinations with only the source and the 
copier as examinees; Si and S 2 cannot be used in this latter case. However, considering 
the computational simplicity and less restrictive assumptions imposed on S 2 , the S 2 index 
may be a good alternative to use in practice. 

Results concerning the Type I errors of u> and K 2 were also consistent with the result 
of previous study (Sotaridona & Meijer, in press) which showed that the u> is sligtly liberal 
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at J — 100 whereas K 2 is conservative. In general, although the empirical Type I errors 
of the four indices are not perfectly in agreement with the nominal Type I errors, the 
deviations are small. 

The present study only considered five percent copiers and the items copied by the 
copiers were selected at random. There is some indication that the magnitude of the 
difference in the ability level of the source and the copier affects the performance of a 
copying index. For future research, it might be interesting to study the Type I errors 
and detection rates of Si and S 2 for varying mode of answer copying and for different 
concentrations of copiers, percentages of correct answers copying, and various ability 
level of the source. 
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Figure Captions 



Figure 1. Graph of 8, as a Function of with p g = 0.25 and p g = 0.20 . 

Figure 2. Scatter Plots of 100 p-values of G 2 Statistics, Ranked in Increasing Order, for 40- 
Item Test 

Figure 3. Nominal and Empirical Type I Error Rate as a Function of Simulee Size and Test 
Length 

Figure 4. Detection Rate of 0), K 2 , Si, and S 2 on Test with 100 Simulees 
Figure 5. Detection Rate of 0), K 2 , Si, and S 2 on Test with 500 Simulees 
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